Search CORE

96 research outputs found

Γ-stochastic neighbour embedding for feed-forward data visualization

Author: Broomhead DS
Elzbieta P
Hinton GE
Iain Rice
Lee JA
Lowe D
Rice I.
Tipping ME.
Van der Maaten L
Van der Maaten LJP
Publication venue: 'SAGE Publications'
Publication date: 01/10/2018
Field of study

t-distributed Stochastic Neighbour Embedding (t-SNE) is one of the most popular nonlinear dimension reduction techniques used in multiple application domains. In this paper we propose a variation on the embedding neighbourhood distribution, resulting in Γ-SNE, which can construct a feed-forward mapping using an RBF network. We compare the visualizations generated by Γ-SNE with those of t-SNE and provide empirical evidence suggesting the network is capable of robust interpolation and automatic weight regularization

Crossref

Aston Publications Explorer

Convolutional LSTM Networks for Subcellular Localization of Proteins

Author: A Graves
A Höglund
A Prlić
C Magnan
G Dahl
HY Xiong
LJP Maaten Van Der
M Schuster
MCF Thomsen
O Emanuelsson
P Baldi
P Lena Di
S Briesemeister
S Henikoff
S Hochreiter
SF Altschul
T Blum
T Goldberg
T Petersen
Y Bengio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Machine learning is widely used to analyze biological sequence data. Non-sequential models such as SVMs or feed-forward neural networks are often used although they have no natural way of handling sequences of varying length. Recurrent neural networks such as the long short term memory (LSTM) model on the other hand are designed to handle sequences. In this study we demonstrate that LSTM networks predict the subcellular location of proteins given only the protein sequence with high accuracy (0.902) outperforming current state of the art algorithms. We further improve the performance by introducing convolutional filters and experiment with an attention mechanism which lets the LSTM focus on specific parts of the protein. Lastly we introduce new visualizations of both the convolutional filters and the attention mechanisms and show how they can be used to extract biological relevant knowledge from the LSTM networks

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Online Research Database In Technology

Visual analytics for collaborative human-machine confidence in human-centric active learning tasks

Author: A Bilal
A Cook
AL Thomaz
B Settles
D Ren
D Sacha
E Lughofer
H Raghavan
H Takagi
J Attenberg
J Smith
JE Smith
L Shixia
LJP Maaten van der
M Kahng
M Liu
O Pauplin
P Caleb-Solly
PA Legg
PA Legg
PA Legg
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2019
Field of study

Active machine learning is a human-centric paradigm that leverages a small labelled dataset to build an initial weak classifier, that can then be improved over time through human-machine collaboration. As new unlabelled samples are observed, the machine can either provide a prediction, or query a human ‘oracle’ when the machine is not confident in its prediction. Of course, just as the machine may lack confidence, the same can also be true of a human ‘oracle’: humans are not all-knowing, untiring oracles. A human’s ability to provide an accurate and confident response will often vary between queries, according to the duration of the current interaction, their level of engagement with the system, and the difficulty of the labelling task. This poses an important question of how uncertainty can be expressed and accounted for in a human-machine collaboration. In short, how can we facilitate a mutually-transparent collaboration between two uncertain actors - a person and a machine - that leads to an improved outcome?In this work, we demonstrate the benefit of human-machine collaboration within the process of active learning, where limited data samples are available or where labelling costs are high. To achieve this, we developed a visual analytics tool for active learning that promotes transparency, inspection, understanding and trust, of the learning process through human-machine collaboration. Fundamental to the notion of confidence, both parties can report their level of confidence during active learning tasks using the tool, such that this can be used to inform learning. Human confidence of labels can be accounted for by the machine, the machine can query for samples based on confidence measures, and the machine can report confidence of current predictions to the human, to further the trust and transparency between the collaborative parties. In particular, we find that this can improve the robustness of the classifier when incorrect sample labels are provided, due to unconfidence or fatigue. Reported confidences can also better inform human-machine sample selection in collaborative sampling. Our experimentation compares the impact of different selection strategies for acquiring samples: machine-driven, human-driven, and collaborative selection. We demonstrate how a collaborative approach can improve trust in the model robustness, achieving high accuracy and low user correction, with only limited data sample selections

Crossref

Directory of Open Access Journals

UWE Bristol Research Repository

A unified data representation theory for network visualization, ordering and coarse-graining

Author: A Lancichinetti
B Karrer
C Song
C Walshaw
C Walshaw
C Walshaw
D Gfeller
D Harel
DB Larremore
DD Lee
DN Reshef
E Ravasz
ER Gansner
F Radicchi
HD Rozenfeld
IA Kovács
IP King
J Barnes
J Hopcroft
JB Kinney
JM Six
JN Kapur
K-I Goh
L van der Maaten
LJP van der Maaten
M Boguña
M Girvan
M Rosvall
M Rosvall
M Sales-Pardo
M Szalay-Bekö
M Zanin
MEJ Newman
MEJ Newman
N Slonim
P Gajer
PA Estévez
PJ Bickel
R Albert
RY Rubinstein
S Fortunato
S Kullback
SC Olhede
T Kamada
TM Fruchterman
WW Zachary
Y-Y Ahn
YF Hu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/02/2015
Field of study

Representation of large data sets became a key question of many scientific disciplines in the last decade. Several approaches for network visualization, data ordering and coarse-graining accomplished this goal. However, there was no underlying theoretical framework linking these problems. Here we show an elegant, information theoretic data representation approach as a unified solution of network visualization, data ordering and coarse-graining. The optimal representation is the hardest to distinguish from the original data matrix, measured by the relative entropy. The representation of network nodes as probability distributions provides an efficient visualization method and, in one dimension, an ordering of network nodes and edges. Coarse-grained representations of the input network enable both efficient data compression and hierarchical visualization to achieve high quality representations of larger data sets. Our unified data representation theory will help the analysis of extensive data sets, by revealing the large-scale structure of complex networks in a comprehensible form.Comment: 13 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Semmelweis Repository

A software framework for data dimensionality reduction: application to chemical crystallography

Author: A Fontanini
A Jain
B Ganapathysubramanian
B van Rietbergen
C McVeigh
CC Fischer
D Morgan
D Morgan
DL Donoho
H Amini
I Takeuchi
J Beardwood
JA Lee
JB Tenenbaum
JC Elliott
JC Meredith
K Matsunaga
K Rajan
KM Rabe
L Pauling
M Belkin
M Bernstein
N Chawla
N Zabaras
NJ Flora
O Wodo
P Grassberger
PHJ Mercier
PV Balachandran
PV Balachandran
PV Balachandran
Q Guo
R Brasca
RC Prim
RD Shannon
RW Floyd
S Bergman
S Samudrala
SA Langer
SS Pramana
ST Roweis
T White
TJ White
Van der Maaten LJP
YL Page
ZK Liu
ZQ Yue
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Diversity of endogenous Avian Leukosis Virus subgroup E (ALVE) insertions in indigenous chickens

Author: A Aswad
A Katzourakis
A Narezkina
A Ostuni
A Takenouchi
A Untergasser
AS Mason
BF Benkel
C Liu
CA Kozak
CK Lai
D Kelly
DP Frisby
E Levin
E Serrao
EJ Smith
EJ Smith
G Bu
G Lepperdinger
G Magiorkinis
H Thorvaldsdóttir
HL Robinson
J Grawenhoff
J Ito
JE Henzy
JP Stoye
JP Stoye
JS Gavora
JS Gavora
K Rutherford
K Venugopal
L Borysenko
L Jadin
L Rifas
LB Crittenden
LD Bacon
LJP van der Maaten
LN Payne
M Martin
M Park
M Varela
MG Elferink
MG Elferink
MR Patel
RA Katz
RF Doolittle
T Hurst
U Kuhnlein
W Fox
WC Warren
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2020
Field of study

Crossref

Edinburgh Research Explorer

CGSpace

White Rose Research Online

Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data

Author: A Antoniadis
A Butte
AL Boulesteix
B Nadler
B Schölkopf
B Schölkopf
C Chatfield
CC Chang
CCC Liu
Christian Ruckert
Christoph Bartenhagen
CL Nutt
D Geman
D Singh
DV Nguyen
H Hotelling
Hans-Ulrich Klein
HU Klein
I Del Giudice
IS Lim
IT Jolliffe
J Baek
J Misra
JB Tenenbaum
JI Powell
JJ Dai
K Dawson
KQ Weinberger
KQ Weinberger
KY Yeung
LJP Van der Maaten
LK Saul
M Belkin
M Belkin
M Mramor
M Vlachos
MA Hibbs
Martin Dugas
N Cristianini
N Pochet
O Chapelle
R Verhaak
R Xu
S Chao
S Lafon
SB Cho
ST Roweis
T Li
TF Cox
TJ Umpai
TR Golub
U Alon
VD Silva
X Lin
Xiaoyi Jiang
Y Su
Y Wang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Visualization of DNA microarray data in two or three dimensional spaces is an important exploratory analysis step in order to detect quality issues or to generate new hypotheses. Principal Component Analysis (PCA) is a widely used linear method to define the mapping between the high-dimensional data and its low-dimensional representation. During the last decade, many new nonlinear methods for dimension reduction have been proposed, but it is still unclear how well these methods capture the underlying structure of microarray gene expression data. In this study, we assessed the performance of the PCA approach and of six nonlinear dimension reduction methods, namely Kernel PCA, Locally Linear Embedding, Isomap, Diffusion Maps, Laplacian Eigenmaps and Maximum Variance Unfolding, in terms of visualization of microarray data. Results A systematic benchmark, consisting of Support Vector Machine classification, cluster validation and noise evaluations was applied to ten microarray and several simulated datasets. Significant differences between PCA and most of the nonlinear methods were observed in two and three dimensional target spaces. With an increasing number of dimensions and an increasing number of differentially expressed genes, all methods showed similar performance. PCA and Diffusion Maps responded less sensitive to noise than the other nonlinear methods. Conclusions Locally Linear Embedding and Isomap showed a superior performance on all datasets. In very low-dimensional representations and with few differentially expressed genes, these two methods preserve more of the underlying structure of the data than PCA, and thus are favorable alternatives for the visualization of microarray data.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Subgroup detection in genotype data using invariant coordinate selection

Author: A Karatzoglou
B Schölkopf
D Fischer
Daniel Fischer
David Cavero
DE Tyler
DE Tyler
FL Bookstein
H Oja
J Miettinen
JB Tenenbaum
Johanna Vilkki
JP Musial
JP Musial
K Nordhausen
Klaus Nordhausen
LJP van der Maaten
M Honkatukia
M Tapio
M Tuiskula-Haavisto
Maria Tuiskula-Haavisto
Mervi Honkatukia
N Patterson
R Vidal
RR Coifman
Rudolf Preisinger
S Ma
S Roweis
WM Rand
X Zheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

T cell cytolytic capacity is independent of initial stimulation strength.

How cells respond to myriad stimuli with finite signaling machinery is central to immunology. In naive T cells, the inherent effect of ligand strength on activation pathways and endpoints has remained controversial, confounded by environmental fluctuations and intercellular variability within populations. Here we studied how ligand potency affected the activation of CD8+ T cells in vitro, through the use of genome-wide RNA, multi-dimensional protein and functional measurements in single cells. Our data revealed that strong ligands drove more efficient and uniform activation than did weak ligands, but all activated cells were fully cytolytic. Notably, activation followed the same transcriptional pathways regardless of ligand potency. Thus, stimulation strength did not intrinsically dictate the T cell-activation route or phenotype; instead, it controlled how rapidly and simultaneously the cells initiated activation, allowing limited machinery to elicit wide-ranging responses

Crossref

Apollo (Cambridge)

Spatial maps of prostate cancer transcriptomes reveal an unexplored landscape of heterogeneity

Author: A Uhlen
AP Patel
AS Morrissy
ATL Lun
BL WELCH
BZ Qian
C Ogris
CA Miller
CS Cooper
D Aran
D Grun
D Hanahan
D Robinson
DG Bostwick
DP Slaughter
DR Green
E Paradis
E Tsouko
EC Bruin de
EP Diamandis
F Bray
G Gundem
GL Semenza
GP Gupta
H Li
H Li
HC Whitaker
I Jarick
IM Chu
J Fernandez Navarro
J Gordetsky
J Lindberg
J Mateo
J Oksanen
JH Lee
JI Epstein
JP Junker
K Achim
KA Trujillo
KT Kim
LC Costello
LE Pascal
LJP Maaten van der
M Gerlinger
M Kanehisa
M Yamakawa
MH Weinstein
MI Love
MKH Hong
MM Markiewski
NN Pavlova
P Jaccard
P Vaupel
PC Fong
PD Prasad
PL Ståhl
Q Li
R Ke
R Satija
R Suzuki
RK Yadav
S Anders
S Kraus
S Persad
SA Abdulkadir
SK Singh
SY Choo
TU Consortium
UR Chandran
V Baron
W Chen
X Yu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Intra-tumor heterogeneity is one of the biggest challenges in cancer treatment today. Here we investigate tissue-wide gene expression heterogeneity throughout a multifocal prostate cancer using the spatial transcriptomics (ST) technology. Utilizing a novel approach for deconvolution, we analyze the transcriptomes of nearly 6750 tissue regions and extract distinct expression profiles for the different tissue components, such as stroma, normal and PIN glands, immune cells and cancer. We distinguish healthy and diseased areas and thereby provide insight into gene expression changes during the progression of prostate cancer. Compared to pathologist annotations, we delineate the extent of cancer foci more accurately, interestingly without link to histological changes. We identify gene expression gradients in stroma adjacent to tumor regions that allow for re-stratification of the tumor microenvironment. The establishment of these profiles is the first step towards an unbiased view of prostate cancer and can serve as a dictionary for future studies

Crossref

Publikationer från Uppsala Universitet

Directory of Open Access Journals

Digitala Vetenskapliga Arkivet - Academic Archive On-line

White Rose Research Online